A ew Method for Clustering Heterogeneous Data: Clustering by Compression

نویسندگان

  • DORIN CARSTOIU
  • ALEXANDRA CERNIAN
  • VALENTIN SGARCIU
  • ADRIANA OLTEANU
چکیده

Nowadays, we have to deal with a large quantity of unstructured data, produced by a number of sources. For example, clustering web pages is essential to getting structured information in response to user queries. In this paper, we intend to test the results of a new clustering technique – clustering by compression – when applied to heterogeneous sets of data. The clustering by compression procedure is based on a parameterfree, universal, similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files (singly and in pair-wise concatenation). Compression algorithms allow defining a similarity measure based on the degree of common information, whereas clustering methods allow clustering similar data without any previous knowledge. Key-Words: clustering, heterogeneous data, clustering by compression, Normalized Compression Distance (NCD), FScore

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Clustering Approach by SSPCO Optimization Algorithm Based on Chaotic Initial Population

Assigning a set of objects to groups such that objects in one group or cluster are more similar to each other than the other clusters’ objects is the main task of clustering analysis. SSPCO optimization algorithm is anew optimization algorithm that is inspired by the behavior of a type of bird called see-see partridge. One of the things that smart algorithms are applied to solve is the problem ...

متن کامل

An Improved SSPCO Optimization Algorithm for Solve of the Clustering Problem

Swarm Intelligence (SI) is an innovative artificial intelligence technique for solving complex optimization problems. Data clustering is the process of grouping data into a number of clusters. The goal of data clustering is to make the data in the same cluster share a high degree of similarity while being very dissimilar to data from other clusters. Clustering algorithms have been applied to a ...

متن کامل

Clustering Heterogeneous Data Using Clustering by Compression

Nowadays, we have to deal with a large quantity of unstructured data, produced by a number of sources. The application of clustering on the World Wide Web is essential to getting structured information in response to user queries. In this paper, we intend to test the results of a new clustering technique – clustering by compression – when applied to heterogeneous sets of data. The clustering by...

متن کامل

An Improved SSPCO Optimization Algorithm for Solve of the Clustering Problem

Swarm Intelligence (SI) is an innovative artificial intelligence technique for solving complex optimization problems. Data clustering is the process of grouping data into a number of clusters. The goal of data clustering is to make the data in the same cluster share a high degree of similarity while being very dissimilar to data from other clusters. Clustering algorithms have been applied to a ...

متن کامل

A Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data

The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is pres...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009